legal judgment prediction
Structured Definitions and Segmentations for Legal Reasoning in LLMs: A Study on Indian Legal Data
Khatri, Mann, Yusuf, Mirza, Shah, Rajiv Ratn, Kumaraguru, Ponnurangam
Large Language Models (LLMs), trained on extensive datasets from the web, exhibit remarkable general reasoning skills. Despite this, they often struggle in specialized areas like law, mainly because they lack domain-specific pretraining. The legal field presents unique challenges, as legal documents are generally long and intricate, making it hard for models to process the full text efficiently. Previous studies have examined in-context approaches to address the knowledge gap, boosting model performance in new domains without full domain alignment. In our paper, we analyze model behavior on legal tasks by conducting experiments in three areas: (i) reorganizing documents based on rhetorical roles to assess how structured information affects long context processing and model decisions, (ii) defining rhetorical roles to familiarize the model with legal terminology, and (iii) emulating the step-by-step reasoning of courts regarding rhetorical roles to enhance model reasoning. These experiments are conducted in a zero-shot setting across three Indian legal judgment prediction datasets. Our results reveal that organizing data or explaining key legal terms significantly boosts model performance, with a minimum increase of ~1.5% and a maximum improvement of 4.36% in F1 score compared to the baseline.
- North America > Canada > Alberta > Census Division No. 13 > Westlock County (0.24)
- North America > Canada > Alberta > Census Division No. 11 > Sturgeon County (0.24)
- Europe > United Kingdom (0.14)
- (8 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.71)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
Judging by Appearances? Auditing and Intervening Vision-Language Models for Bail Prediction
Basu, Sagnik, Prakash, Shubham, Barge, Ashish Maruti, Jaiswal, Siddharth D, Dash, Abhisek, Ghosh, Saptarshi, Mukherjee, Animesh
Large language models (LLMs) have been extensively used for legal judgment prediction tasks based on case reports and crime history. However, with a surge in the availability of large vision language models (VLMs), legal judgment prediction systems can now be made to leverage the images of the criminals in addition to the textual case reports/crime history. Applications built in this way could lead to inadvertent consequences and be used with malicious intent. In this work, we run an audit to investigate the efficiency of standalone VLMs in the bail decision prediction task. We observe that the performance is poor across multiple intersectional groups and models \textit{wrongly deny bail to deserving individuals with very high confidence}. We design different intervention algorithms by first including legal precedents through a RAG pipeline and then fine-tuning the VLMs using innovative schemes. We demonstrate that these interventions substantially improve the performance of bail prediction. Our work paves the way for the design of smarter interventions on VLMs in the future, before they can be deployed for real-world legal judgment prediction.
- Oceania > Australia (0.14)
- North America > United States > Illinois (0.05)
- Europe > Germany > Saarland > Saarbrücken (0.04)
- (3 more...)
- Law > Criminal Law (1.00)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
GLARE: Agentic Reasoning for Legal Judgment Prediction
Yang, Xinyu, Deng, Chenlong, Dou, Zhicheng
Legal judgment prediction (LJP) has become increasingly important in the legal field. In this paper, we identify that existing large language models (LLMs) have significant problems of insufficient reasoning due to a lack of legal knowledge. Therefore, we introduce GLARE, an agentic legal reasoning framework that dynamically acquires key legal knowledge by invoking different modules, thereby improving the breadth and depth of reasoning. Experiments conducted on the real-world dataset verify the effectiveness of our method. Furthermore, the reasoning chain generated during the analysis process can increase interpretability and provide the possibility for practical applications.
- North America > Canada > Alberta > Census Division No. 13 > Westlock County (0.24)
- North America > Canada > Alberta > Census Division No. 11 > Sturgeon County (0.24)
- North America > United States (0.04)
- (4 more...)
- Overview (0.93)
- Research Report > New Finding (0.46)
The Judge Variable: Challenging Judge-Agnostic Legal Judgment Prediction
This study examines the role of human judges in legal decision-making by using machine learning to predict child physical custody outcomes in French appellate courts. Building on the legal realism-formalism debate, we test whether individual judges' decision-making patterns significantly influence case outcomes, challenging the assumption that judges are neutral variables that apply the law uniformly. To ensure compliance with French privacy laws, we implement a strict pseudonymization process. Our analysis uses 18,937 living arrangements rulings extracted from 10,306 cases. We compare models trained on individual judges' past rulings (specialist models) with a judge-agnostic model trained on aggregated data (generalist models). The prediction pipeline is a hybrid approach combining large language models (LLMs) for structured feature extraction and ML models for outcome prediction (RF, XGB and SVC). Our results show that specialist models consistently achieve higher predictive accuracy than the general model, with top-performing models reaching F1 scores as high as 92.85%, compared to the generalist model's 82.63% trained on 20x to 100x more samples. Specialist models capture stable individual patterns that are not transferable to other judges. In-Domain and Cross-Domain validity tests provide empirical support for legal realism, demonstrating that judicial identity plays a measurable role in legal outcomes. All data and code used will be made available.
- South America > French Guiana > Guyane > Cayenne (0.04)
- Oceania > New Caledonia > South Province > Noumea (0.04)
- Oceania > French Polynesia > Windward Islands > Papeete (0.04)
- (15 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Law > Litigation (1.00)
- Law > Government & the Courts (0.94)
- Law > Criminal Law (0.93)
MultiJustice: A Chinese Dataset for Multi-Party, Multi-Charge Legal Prediction
Wang, Xiao, Pei, Jiahuan, Shui, Diancheng, Han, Zhiguang, Sun, Xin, Zhu, Dawei, Shen, Xiaoyu
Legal judgment prediction (LJP) offers a compelling method to aid legal practitioners and researchers. However, the research question remains relatively underexplored: Should multiple defendants and charges be treated separately in LJP? To address this, we introduce a new dataset, namely multi-person multi-charge prediction (MPMCP), and seek the answer by evaluating the performance of several prevailing legal large language models (LLMs) on four practical legal judgment scenarios: (S1) single defendant with a single charge, (S2) single defendant with multiple charges, (S3) multiple defendants with a single charge, and (S4) multiple defendants with multiple charges. We evaluate the dataset across two LJP tasks, i.e., charge prediction and penalty term prediction. We have conducted extensive experiments and found that the scenario involving multiple defendants and multiple charges (S4) poses the greatest challenges, followed by S2, S3, and S1. The impact varies significantly depending on the model. For example, in S4 compared to S1, InternLM2 achieves approximately 4.5% lower F1-score and 2.8% higher LogD, while Lawformer demonstrates around 19.7% lower F1-score and 19.0% higher LogD.
- Europe > Netherlands > North Holland > Amsterdam (0.05)
- Europe > Germany > Saarland (0.05)
- Asia > China > Hubei Province > Wuhan (0.04)
- (5 more...)
- Research Report > New Finding (0.66)
- Research Report > Experimental Study (0.48)
- Law > Criminal Law (1.00)
- Information Technology > Security & Privacy (0.94)
LegalReasoner: Step-wised Verification-Correction for Legal Judgment Reasoning
Shi, Weijie, Zhu, Han, Ji, Jiaming, Li, Mengze, Zhang, Jipeng, Zhang, Ruiyuan, Zhu, Jia, Xu, Jiajie, Han, Sirui, Guo, Yike
Legal judgment prediction (LJP) aims to function as a judge by making final rulings based on case claims and facts, which plays a vital role in the judicial domain for supporting court decision-making and improving judicial efficiency. However, existing methods often struggle with logical errors when conducting complex legal reasoning. We propose LegalReasoner, which enhances LJP reliability through step-wise verification and correction of the reasoning process. Specifically, it first identifies dispute points to decompose complex cases, and then conducts step-wise reasoning while employing a process verifier to validate each step's logic from correctness, progressiveness, and potential perspectives. When errors are detected, expert-designed attribution and resolution strategies are applied for correction. To fine-tune LegalReasoner, we release the LegalHK dataset, containing 58,130 Hong Kong court cases with detailed annotations of dispute points, step-by-step reasoning chains, and process verification labels. Experiments demonstrate that LegalReasoner significantly improves concordance with court decisions from 72.37 to 80.27 on LLAMA-3.1-70B. The data is available at https://huggingface.co/datasets/weijiezz/LegalHK.
- South America > Brazil (0.04)
- Europe > United Kingdom (0.04)
- Asia > China > Hong Kong > Kowloon (0.04)
- Workflow (0.67)
- Research Report (0.64)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.90)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)
RLJP: Legal Judgment Prediction via First-Order Logic Rule-enhanced with Large Language Models
Zhang, Yue, Tian, Zhiliang, Zhou, Shicheng, Wang, Haiyang, Hou, Wenqing, Liu, Yuying, Zhao, Xuechen, Huang, Minlie, Wang, Ye, Zhou, Bin
Legal Judgment Prediction (LJP) is a pivotal task in legal AI. Existing semantic-enhanced LJP models integrate judicial precedents and legal knowledge for high performance. But they neglect legal reasoning logic, a critical component of legal judgments requiring rigorous logical analysis. Although some approaches utilize legal reasoning logic for high-quality predictions, their logic rigidity hinders adaptation to case-specific logical frameworks, particularly in complex cases that are lengthy and detailed. This paper proposes a rule-enhanced legal judgment prediction framework based on first-order logic (FOL) formalism and comparative learning (CL) to develop an adaptive adjustment mechanism for legal judgment logic and further enhance performance in LJP. Inspired by the process of human exam preparation, our method follows a three-stage approach: first, we initialize judgment rules using the FOL formalism to capture complex reasoning logic accurately; next, we propose a Confusion-aware Contrastive Learning (CACL) to dynamically optimize the judgment rules through a quiz consisting of confusable cases; finally, we utilize the optimized judgment rules to predict legal judgments. Experimental results on two public datasets show superior performance across all metrics. The code is publicly available{https://anonymous.4open.science/r/RLJP-FDF1}.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Austria > Vienna (0.14)
- North America > Canada (0.04)
- (4 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.86)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.60)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
Adaptive Sentencing Prediction with Guaranteed Accuracy and Legal Interpretability
Jin, Yifei, Zheng, Xin, Guo, Lei
Existing research on judicial sentencing prediction predominantly relies on end-to-end models, which often neglect the inherent sentencing logic and lack interpretability-a critical requirement for both scholarly research and judicial practice. To address this challenge, we make three key contributions:First, we propose a novel Saturated Mechanistic Sentencing (SMS) model, which provides inherent legal interpretability by virtue of its foundation in China's Criminal Law. We also introduce the corresponding Momentum Least Mean Squares (MLMS) adaptive algorithm for this model. Second, for the MLMS algorithm based adaptive sentencing predictor, we establish a mathematical theory on the accuracy of adaptive prediction without resorting to any stationarity and independence assumptions on the data. We also provide a best possible upper bound for the prediction accuracy achievable by the best predictor designed in the known parameters case. Third, we construct a Chinese Intentional Bodily Harm (CIBH) dataset. Utilizing this real-world data, extensive experiments demonstrate that our approach achieves a prediction accuracy that is not far from the best possible theoretical upper bound, validating both the model's suitability and the algorithm's accuracy.
- Asia > China > Beijing > Beijing (0.04)
- North America > United States (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Data Science (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)
Debate-Feedback: A Multi-Agent Framework for Efficient Legal Judgment Prediction
Chen, Xi, Mao, Mao, Li, Shuo, Shangguan, Haotian
The use of AI in legal analysis and prediction (LegalAI) has gained widespread attention, with past research focusing on retrieval-based methods and fine-tuning large models. However, these approaches often require large datasets and underutilize the capabilities of modern large language models (LLMs). In this paper, inspired by the debate phase of real courtroom trials, we propose a novel legal judgment prediction model based on the Debate-Feedback architecture, which integrates LLM multi-agent debate and reliability evaluation models. Unlike traditional methods, our model achieves significant improvements in efficiency by minimizing the need for large historical datasets, thus offering a lightweight yet robust solution. Comparative experiments show that it outperforms several general-purpose and domain-specific legal models, offering a dynamic reasoning process and a promising direction for future LegalAI research.
- North America > United States > Tennessee > Davidson County > Nashville (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > Canada > Alberta > Census Division No. 13 > Westlock County (0.04)
- (3 more...)
AnnoCaseLaw: A Richly-Annotated Dataset For Benchmarking Explainable Legal Judgment Prediction
Sesodia, Magnus, Petrova, Alina, Armour, John, Lukasiewicz, Thomas, Camburu, Oana-Maria, Dokania, Puneet K., Torr, Philip, de Witt, Christian Schroeder
Legal systems worldwide continue to struggle with overwhelming caseloads, limited judicial resources, and growing complexities in legal proceedings. Artificial intelligence (AI) offers a promising solution, with Legal Judgment Prediction (LJP) -- the practice of predicting a court's decision from the case facts -- emerging as a key research area. However, existing datasets often formulate the task of LJP unrealistically, not reflecting its true difficulty. They also lack high-quality annotation essential for legal reasoning and explainability. To address these shortcomings, we introduce AnnoCaseLaw, a first-of-its-kind dataset of 471 meticulously annotated U.S. Appeals Court negligence cases. Each case is enriched with comprehensive, expert-labeled annotations that highlight key components of judicial decision making, along with relevant legal concepts. Our dataset lays the groundwork for more human-aligned, explainable LJP models. We define three legally relevant tasks: (1) judgment prediction; (2) concept identification; and (3) automated case annotation, and establish a performance baseline using industry-leading large language models (LLMs). Our results demonstrate that LJP remains a formidable task, with application of legal precedent proving particularly difficult. Code and data are available at https://github.com/anonymouspolar1/annocaselaw.
- North America > United States (1.00)
- Europe > United Kingdom (0.28)
- Asia > Middle East > UAE (0.14)
- (5 more...)
- Law > Litigation (1.00)
- Government > Regional Government > North America Government > United States Government (0.46)